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O Abstract 

We show how any dynamic instantaneous compression algorithm can be converted 
' to an asymmetric communication protocol, with which a server with high bandwidth 

can help clients with low bandwidth send it messages. Unlike previous authors, we 
do not assume the server knows the messages' distribution, and our protocols are 
the first to use only one round of communication for each message. 
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1 Introduction 



Internet users usually download more than they upload, and many technologies 
have asymmetric bandwidth — greater from servers to clients than from clients 
to servers. Adler and Maggs [3j considered whether a server can use its greater 
bandwidth to help clients send it messages. They proved it can, assuming it 
knows the messages' distribution. We argue this assumption is often both 
unwarranted and, fortunately, unnecessary. 

Suppose a number of clients want to send messages to a server. At any point, 
the server knows all the messages it has received so far; each client only knows 
its own messages and does not overhear communication between other clients 
and the server. Thus, the server may be able to construct a good code but 
the clients individually cannot. Adler and Maggs assumed the server, after 
receiving a sample of messages, can accurately estimate the distribution of all 
the messages. This assumption let them simplify the problem: Can the server 
help a single client send it a message drawn from a distribution known to 
the server? Given a representative sample of messages and a protocol for this 
simpler problem, the server can just repeat the protocol for each remaining 
message. In fact, it can even do this in parallel. 
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Adler and Maggs gave protocols for the simpler problem in which the server 
uses its knowledge to reduce the expected number of bits the client sends 
to roughly the entropy of the distribution. Their work has been improved 
and extended by several authors [r^llMTTfo] . whose results are summarized 
in Table [TlPI and used in the Infranet ant i- censorship system [7|8] . However, 
while implementing Infranet, Wang [12] found the distribution of the messages 
(webpage requests) changed over time — the sample was unreliable. 
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Table 1 



Suppose a server tries to help a client send it one of n messages, chosen according to 
a distribution with entropy H that is known to the server but not the client. Adler 
and Maggs [3], Watkinson, Adler and Fich [13], Ghazizadeh, Ghodsi and Saberi [9] 
and Bose, Krizanc, Langerman and Morin [5] gave protocols for this problem with 
expected-case upper bounds as shown; the last three protocols take a parameter 
k > 1. This table is based on one given by Bose et al. but we use a different 
notation. 

2 Dynamic compression and asymmetric communication 

In this section we show how algorithms for dynamic instantaneous data com- 
pression — e.g., dynamic Huffman coding [TU] and move-to-front compres- 
sion [1] - can be converted to dynamic asymmetric communication proto- 
cols. By dynamic protocols, we mean ones not needing the server to know the 
messages' distribution. 

For dynamic instantaneous compression (also called prefix-free compression), 
an encoder makes a single pass over a string S and writes a codeword after 
reading each character; a decoder can later make a single pass over the en- 
coding and writes a character of S after reading each codeword. Implicitly or 

1 Table [1] does not include a recent paper by Adler [lj, in which he considered 
a harder version of the original problem with many clients: Can the server take 
advantage of correlations between messages? He showed it can, but used the even 
stronger assumption that the server knows the probability distribution over entire 
sequences of messages. 
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explicitly, the encoder and decoder each maintain a binary tree, called a code- 
tree, in which left edges are labelled with Os, right edges are labelled with Is 
and leaves are labelled with the characters in the alphabet; the binary string 
on the path from the root to a leaf is the codeword for that leaf's label. When 
the encoder reads a character a, it writes the codeword for a and updates its 
code-tree; when the decoder reads the codeword for a, it writes a and updates 
its code-tree. Notice that, as long as the encoder and decoder initialize and 
update their code-trees according to the same rule, then the encoder's code- 
tree when writing a codeword and the decoder's code-tree when reading that 
codeword are the same — so the decoder always writes the same characters 
the encoder has read. 

Theorem 1 Suppose a number of clients want to send a sequence S of mes- 
sages, drawn from a set of size n, to a server. Furthermore, suppose there is 
a dynamic instantaneous compression algorithm that, when applied to S, pro- 
duces a B-bit encoding. Then for any k > 1, there is a dynamic asymmetric 
communication protocol with which the server sends at most 0(n 1//fc logn) bits 
to each client and the clients send, in total, at most kB + 2\S\ bits. 

PROOF. The server builds a code-tree T in the same way the encoder would. 
For each message s in S, the server truncates T at depth [(logn)//c] to obtain 
a tree T' (some or all of whose leaves may not be labelled), then sends T' to 
the client that has s. Since T' has at most 2^ logn )/ fc l < 2n 1 ' k leaves, the server 
can send it to the client in 0(n l / k logn) bits. 

The client examines T' and, if it finds s labelling a leaf, responds with 1 
followed by the codeword for s according to T' (and T); if not, it responds with 
followed by the [logn] -bit representation of s's index in the set of possible 
messages. Notice that, in both cases, the server receives enough information 
to recover s; once it has done this, the server updates T in the same way the 
encoder would. Thus, T is always the same as if maintained by the encoder. 

Let b be the length of the codeword for s in T. If b < [(logn) /A;], then the 
client sends 6+1 bits; otherwise, the client sends [logn] + 1 < kb + 2 bits. 
Since the compression algorithm encodes S in B bits, all the clients together 
send, in total, at most kB + 2\S\ bits. □ 



3 Incompressibility and asymmetric communication 

An advantage of our conversion in Theorem [TJ from dynamic instantaneous 
compression to dynamic asymmetric communication, is that the resulting pro- 
tocols use only one round of communication for each message; i.e., the server 
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sends a truncated code-tree and the client responds with either a codeword or 
an index. Reducing the number of rounds needed is useful because, as Adler, 
Demaine, Harvey and Patra§cu [2] wrote: 

Any time savings obtained from reducing the number of bits sent by the 
client could easily be lost by the extra latency cost induced by multiple 
rounds in the protocol, particularly in long-distance networks, such as satel- 
lites, where communication has very high latency. 

However, in the same paper, they proved a lower bound for protocols that use 
few rounds. They considered the simpler asymmetric communication problem 
- in which the server knows the distribution (with entropy H) over the n 
possible messages and there is only one client. Lower bounds for this simpler 
problem also hold for dynamic asymmetric communication. They proved pro- 
tocols that use o ( log log n \ rounc [ s w jth high probability and with which the 

\ log log log n J ot- j 

client is expected to send at most 0(H + 1) bits cannot have a 2( log ") e upper 
bound on the number of bits the server sends, for any e > 00 

For the special case of single-round protocols, a stronger, earlier lower bound 
was proved by Adler and Maggs [3] : Protocols with which the client is expected 
to send at most kH bits cannot have a ^n 1 ^ 20 ^ logn upper bound on the 
number of bits the server sends. We now prove a nearly tight lower bound for 
this special case, assuming transmissions are self-delimiting; we show afterward 
that assumption is not necessary. 

Theorem 2 There does not exist a single-round asymmetric communication 
protocol with which, given a probability distribution P with entropy H over a 
set of n possible messages, the server sends 0(n l l k ~ e ) bits in the worst case 
and the client sends at most kH + o(logn) bits in the expected case, for any 
k > 1 and e > 0. 

PROOF. For the sake of a contradiction, assume there does exist such a 
protocol. Let S — Si, . . . , s m be a sequence of m — n l l k ~ e l 2 messages chosen 
uniformly at random and let P be the normalized distribution of messages in 
S. By assumption, the server sends sends 0(n l l k ~ e ) C o(m) bits and, by the 
definition of entropy, 



H < logm = 




Let a' denote the client's response when it has message a. Since P is the 
normalized distribution of the messages in S, the length of s[ ■ ■ ■ s' m is m 

2 This does not contradict the second row of Table [TJ Adler and Maggs' second 
protocol uses O(l) rounds in the expected case but not with high probability. 
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times the expected length of the client's transmission — at most 
m [kH + o(logn)^ = ( 1 — — ^ mlogn + o(m log n) 

bits. 

Since transmissions are self-delimiting, we can store S as the server's transmis- 
sion followed by s[ ■ ■ ■ s' m ; thus, we can store S in — yj m log n + o(m log n) 
bits. However, by a simple counting argument, storing S takes mlogn bits in 
the average case. □ 

Even if the channel between the server and the client indicates the end of 
transmissions — so the client's response need not be self-delimiting — The- 
orem [2] still applies. To see why, assume there is a single-round protocol for 
such channels, with which the server sends 0(r2, 1 / fc ~ e ) bits and the client is 
expected to send at most kH + o(log n) bits. We can make all transmissions 
self-delimiting by prefacing each transmission by its length encoded in Elias' 
gamma code [6]; in this code, the codeword for the positive integer i has 
length 2 [log ij + 1. Notice the resulting protocol still takes a single round; 
since 0(logn 1//fc + logn 1//fc_e ) = 0{n l ' k ~ e ), the server's self-delimiting trans- 
mission is still 0(n 1//fe_e ) bits; by Jensen's Inequality and because H < logn, 
we have kH + 2\og(kH) + l + o(\ogn) = kH+o(\ogn) and the expected length 
of the client's self-delimiting transmission is still at most kH + o(logn) bits. 
However, Theorem [2] forbids the existence of such a protocol. 
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